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Higher- Dimensional  Signal  Processing  via  Multiscale  Geometric  Analysis 

Final  Report 

Richard  G.  Baraniuk  and  Hyeokho  Choi 


This  project  pursued  a  general  theory  for  complex-valued  multiscale  signal  and  image  modeling,  processing, 
and  analysis  that  is  matched  to  singularity-rich  data.  Higher-dimensional  signals  that  feature  geometric  manifold 
structures  were  of  particular  interest  in  developing  theory  and  a  practical  toolset  for  analysis  and  processing.  We 
pursued  a  three-pronged  approach  in  creating  new  multiscale  transforms,  new  geometric  statistical  models,  and  new 
manifold-based  signal  representations.  The  results  of  our  research  include  (1)  the  Dual  Tree  Quaternion  Wavelet,  an 
efficient  transform  and  analysis  tool  that  features  near  shift  invariance  and  linear  computational  complexity;  (2)  a 
geometric  hidden  Markov  tree  wavelet  model,  which  accounts  for  geometric  regularity  by  capturing  the  dependencies 
between  complex  wavelet  coefficients  along  a  contour;  and  (3)  surflet  representations  of  signal  discontinuities  with 
near-optimal  rate-distortion  performance.  These  new  tools  have  led  to  significant  performance  gains  immediately 
applicable  to  a  number  of  important  Navy-relevant  applications,  including  target  detection  and  classification,  image 
segmentation  and  fusion,  and  computer  network  traffic  modeling. 


1  Introduction 

This  is  the  final  report  for  ONR  Grant  N00014-02- 1-0353  Coherent  Multiscale  Statistical  Modeling  using  Complex 
Wavelets  and  its  one-year  competitive  renewal.  Higher- Dimensional  Signal  Processing  via  Multiscale  Ceometric 
Analysis.  We  begin  by  restating  the  motivation  for  our  work,  reviewing  the  project  objectives,  and  summarizing 
our  work.  We  present  our  results  and  follow  each  research  thrust  with  potential  future  areas  of  work.  We  conclude 
with  a  list  of  publications  supported  by  the  grant,  and  a  list  of  project  personnel. 

1.1  Review  of  motivation 

Over  the  past  twenty  years  multiscale  methods  like  the  discrete  wavelet  transform  (DWT)  have  revolutionized  signal 
processing.  Wavelets  are  particularly  apt  for  representing  singularity-rich  one-dimensional  (1-D)  signals.  In  contrast 
to  the  sinusoidal  basis  functions  underlying  Fourier  analysis,  wavelet  basis  functions,  or  atoms,  are  localized  in  both 
space  and  frequency  and  thus  yield  sparse  and  structured  representations  of  piecewise  smooth  signals  (signals  that 
are  smooth  except  for  a  finite  number  of  localized  point  singularities  or  jump  discontinuities).  This  sparsity  and 
structure  boost  the  performance  of  wavelet-domain  statistical  models  and  enable  simple  yet  powerful  algorithms  for 
estimation/denoising,  compression,  classification,  and  segmentation. 

Despite  the  remarkable  advantages  of  wavelets  for  analyzing  and  processing  1-D  signals,  a  surprising  realization 
of  the  past  few  years  is  the  inability  of  wavelets  to  capitalize  in  a  similar  way  on  2-D  images  containing  sharp  edges. 
The  new,  confounding  aspect  of  edge  singularities  is  geometry:  the  jump  discontinuity  making  up  a  typical  edge 
is  localized  along  a  smooth  1-D  contour  in  2-D.  The  DWT  is  incapable  of  exploiting  the  fact  that  the  contour  is 
smooth  and  thus  predictable,  and  as  a  result  it  takes  too  many  wavelets  to  build  up  the  edge,  destroying  any  hope 
of  sparsity  and  optimality  for  image  processing. 

The  response  from  the  computational  harmonic  analysis  (CHA)  community  has  been  a  new  generation  of  multi¬ 
scale  geometric  analysis  (MCIA)  toolsfor  2-D  piecewise  smooth  functions,  including  wedgelets,  ridgelets,  curvelets, 
contourlets,  bandelets,  wedgeprints,  and  complex  wavelets.  In  order  to  efficiently  deal  with  the  local  geometrical 
structure  of  image  edges,  these  representations  feature  atoms  that  are  both  local  and  directional  (oriented).  This 
programme  in  2-D  has  been  quite  successful;  most  of  the  above  tools  provide  optimal  representations  for  piecewise 
smooth  images  with  smooth  (C^)  edges. 

However,  progress  has  not  continued  apace  in  higher  dimensions  (3-D,  4-D,  and  beyond)  even  though  there 
are  an  increasing  number  of  important  and  emerging  applications  in  3-D  (video;  computer  graphics  and  modeling; 
volumetric,  hyperspectral,  seismic,  and  astronomical  imaging  and  remote  sensing;  and  robotics),  4-D  (light-held 
imaging,  time-lapse  seismic  imaging),  and  5-D  (plenoptic  modeling,  light-held  video,  virtual  reality).  The  reason 
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for  this  lack  of  progress  is  that  the  geometrical  structure  of  singularities  in  3-D  and  above  is  not  merely  a  simple 
extension  of  that  of  edge  singularities  in  2-D.  The  added  complexity  in  higher-  dimensions  is  that  singularities  can 
themselves  be  multidimensional.  Consider  as  a  prototype  n-D  piecewise  smooth  signals  that  are  smooth  everywhere 
except  for  singularities  along  smooth  m-D  manifolds  (rods,  sheets,  and  so  on),  with  0  <  m  <  n.  For  instance,  an 
edge  discontinuity  in  a  2-D  image  lies  along  a  1-D  curve;  an  aircraft  flying  through  3-D  space  tracks  out  a  smooth 
1-D  curve  in  4-D  space-time;  a  video  of  a  smooth  object  moving  smoothly  in  time  lies  on  a  2-D  manifold  in  3-D 
space-time;  and  points  in  space  can  be  viewed  as  lying  on  0-D  manifolds  in  3-D  space. 

Manifold  singularity  structures  are  local  and  have  a  preferred  orientation,  they  are  smooth  in  the  “direction” 
of  the  manifold  (along  the  m-D  tangent  space)  and  contain  a  singularity  (rapid  change)  in  the  “direction”  of  the 
normal  (along  the  (n  —  m)-D  normal  space).  In  order  to  carry  the  success  of  MGA  into  higher  dimensions,  there  is 
a  great  need  for  new  theory  and  tools  to  exploit  these  geometric  structures.  Other  than  a  few  scattered  promising 
results,  such  theory  and  tools  do  not  exist  today. 

1.2  Project  overview 

1.2.1  Project  objectives  and  accomplishments 

This  project  pursued  a  general  theory  for  complex-valued  multiscale  signal  and  image  modeling,  processing,  and 
analysis  that  is  matched  to  singularity-rich  data.  Higher-dimensional  signals  that  feature  geometric  manifold  struc¬ 
tures  were  of  particular  interest  in  developing  theory  and  a  practical  toolset  for  analysis  and  processing.  The  three 
primary  thrusts  of  our  research  were: 

•  New  multiscale  transforms:  We  attacked  a  specific  and  important  drawback  of  current  wavelet  grammars, 
the  fact  that  virtually  all  current  models  deal  exclusively  with  the  energy  or  magnitude  of  the  wavelet  coef¬ 
ficients.  To  more  accurately,  realistically,  and  efficiently  represent  signal  singularity  structures,  we  proposed 
the  creation  of  new  complex  wavelet  transforms  and  analysis  tools.  The  result  was  our  Dual  Tree  Quaternion 
Wavelet,  an  efficient  transform  and  analysis  tool  that  features  near  shift  invariance  and  linear  computational 
complexity. 

•  New  geometric  statistical  models:  Traditional  wavelet-based  image  processing  algorithms  and  models 
have  significant  shortcomings  in  their  treatment  of  edge  contours.  Our  goal  in  creating  new  geometric  statis¬ 
tical  models  was  to  leverage  both  magnitude  and  phase  behavior  of  the  complex  wavelet  vocabulary  to  more 
better  represent  signal  singularities.  Our  work  on  this  project  thrust  produced  a  geometric  hidden  Markov 
tree  wavelet  model.  The  model  accounts  for  geometric  regularity  by  capturing  the  dependencies  between 
complex  wavelet  coefficients  along  a  contour,  and  immediately  led  to  a  feature  extraction  application. 

•  New  manifold-based  signal  representations:  We  wanted  to  explore  representations  of  higher  dimen¬ 
sional  signals  whose  locality  and  directionality  were  matched  to  the  geometry  of  lower-dimensional  manifold 
structures.  In  so  doing  our  aim  was  to  better  represent  signal  discontinuities,  an  important  aspect  of  many 
signal  processing  applications.  Our  work  focused  on  introducing  a  surflet  representation  for  N-dimensional 
Horizon  functions  containing  a  smooth  singularity  in  fV  —  1  dimensions.  Surflets  allowed  a  multiscale, 
piecewise  polynomial  approximation  of  discontinuities.  We  also  created  a  compression  algorithm  using  surflets 
that  comes  within  a  logarithmic  factor  of  the  optimal  rate-distortion  performance  for  this  function  class. 

1.2.2  Research  impact 

Our  new  models  and  analysis  open  unprecedented  possibilities  for  designing  processing  tools  matched  to  singularity- 
rich  data.  These  new  tools  have  led  to  significant  performance  gains  immediately  applicable  to  a  number  of 
important  Navy-relevant  applications,  including  target  detection  and  classification,  image  segmentation  and  fusion, 
and  computer  network  traffic  modeling.  Our  approach  generalizes  to  other  applications  as  well,  including  machine 
fault  monitoring,  medical  imagery,  robotics,  virtual  reality,  remote  sensing,  and  sensor  networks. 


2  Dual  Tree  Quaternion  Wavelets:  A  new  multiscale  transform 

2.1  Summary  of  results 

The  dual-tree  quaternion  wavelet  transform  (QWT)  is  a  new  multiscale  analysis  tool  for  geometric  image  features. 
The  QWT  is  a  near  shift-invariant  tight  frame  representation  whose  coefficients  sport  a  magnitude  and  three  phases: 
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two  phases  encode  local  image  shifts  while  the  third  contains  image  texture  information.  The  QWT  is  based  on 
an  alternative  theory  for  the  2-D  Hilbert  transform  and  can  be  computed  using  a  dual-tree  filter  bank  with  linear 
computational  complexity.  To  demonstrate  the  properties  of  the  QWT’s  coherent  magnitude/phase  representation, 
we  develop  an  efficient  and  accurate  procedure  for  estimating  the  local  geometrical  structure  of  an  image.  We  also 
develop  a  new  multiscale  algorithm  for  estimating  the  disparity  between  a  pair  of  images  that  is  promising  for 
image  registration  and  flow  estimation  applications.  The  algorithm  features  multiscale  phase  unwrapping,  linear 
complexity,  and  sub-pixel  estimation  accuracy. 

2.2  Review  of  Real  and  Complex  Wavelets 

2.2.1  Real  DWT 

The  real  DWT  represents  a  1-D  real-valued  signal  f{t)  in  terms  of  shifted  versions  of  a  scaling  function  (j){t)  and 
shifted  and  scaled  versions  of  a  wavelet  function  ip{f)  [1].  The  functions  (f>L,p{t)  =  2^(j){2^t  —  p)  and  ifi^pit)  = 
2^ip{2^t  —  p),£>L,pG'E  form  an  orthonormal  basis,  and  we  can  represent  any  f{t)  G  ^2(1^)  as 

/(*)  =  E  CL,p4>L,p{^)  +  ^  dl^plfl^pit),  (1) 

P^Tj  i'>L,pG'^ 

where  CL,p  =  J  f{t)(j)L,p{t)dt  and  de,p  =  J  f{t)'fn^p{f)dt  are  the  scaling  and  wavelet  coefficients,  respectively.  The 
parameter  L  sets  the  coarsest  scale  space  that  is  spanned  by  (j>L^p{t).  Behind  each  wavelet  transform  is  a  fllterbank 
based  on  lowpass  and  highpass  Alters. 

The  standard  real  2-D  DWT  is  obtained  using  tensor  products  of  1-D  DWTs  over  the  horizontal  and  vertical 
dimensions.  The  result  is  the  scaling  function  <j){x)(j){y)  and  three  subband  wavelets  ^(x)^(y),  ip{x)(l>{y),  and 
'ip{x)'il){y)  that  are  oriented  in  the  horizontal,  vertical,  and  diagonal  directions,  respectively  [1]. 

The  real  wavelet  transform  suffers  from  shift  variance;  that  is,  a  small  shift  in  the  signal  can  greatly  perturb 
the  magnitude  of  wavelet  coefficients  around  singularities.  It  also  lacks  a  notion  of  phase  to  encode  signal  location 
information  and  suffers  from  aliasing  [2].  These  issues  complicate  modeling  and  information  extraction  in  the 
wavelet  domain. 

2.2.2  Dual-tree  CWT 

The  1-D  dual-tree  CWT  expands  a  real- valued  signal  in  terms  of  two  sets  of  wavelet  and  scaling  functions  obtained 
from  two  independent  fllterbanks  [3],  as  shown  in  Figure  1.  We  will  use  the  notation  and  'tph{t)  to  denote 

the  scaling  and  wavelet  functions  and  Chj^  ^  and  dh,.  ^  to  denote  their  corresponding  coefficients,  where  h  specifies  a 
particular  set  of  wavelet  Alters.  The  wavelet  functions  'f’h{t)  and  ifg{t)  from  the  two  trees  play  the  role  of  the  real 
and  imaginary  parts  of  a  complex  analytic  wavelet  tf^it)  =  tfh(t)  +  j'lpgft).  The  imaginary  wavelet  f’gft)  is  the  1-D 
HT  of  the  real  wavelet  iph{t)-  The  combined  system  is  a  2x  redundant  tight  frame  that,  by  virtue  of  the  fact  that 
\fj‘^{t)\  is  non-oscillating,  is  near  shift-invariant.^ 

It  is  useful  to  recall  that  the  Fourier  transform  of  the  imaginary  wavelet  'l'g(w)  equals  when  a;  >  0 

and  when  w  <  0.  Thus,  the  Fourier  transform  of  the  complex  wavelet  function  g{uj)  =  '£''^(0;) 

has  no  energy  (or  little  in  practice)  in  the  negative  frequency  region,^  making  it  an  analytic  signal  [3]. 

2.2.3  Hilbert  transforms  and  2-D  CWT 

Extending  the  1-D  CWT  to  2-D  requires  an  extension  of  the  HT  and  analytic  signal.  There  exist  not  one  but 
several  different  definitions  of  the  2-D  analytic  signal  that  each  zero  out  a  different  portion  of  the  2-D  frequency 
plane  [5].  We  will  consider  two  definitions.  The  first,  proposed  by  Hahn  in  [6],  employs  complex  algebra  and  zeros 
out  frequencies  on  all  but  a  single  quadrant  {kx,ky  >  0,  for  example,  where  {kx,ky)  indexes  the  2-D  frequency 
plane).  In  this  formulation,  the  complete  2-D  analytic  signal  consists  of  two  parts:  one  having  spectrum  on  the 
upper  right  quadrant  {kx,  ky  >  0)  and  the  other  on  the  upper  left  quadrant  {kx  <  0,ky  >  0)  [5]. 

finitely  supported  function  can  never  be  exactly  analytic  [2].  In  practice,  we  can  only  design  finite-length  complex  wavelets  that 
are  approximately  analytic,  and  thus  the  CWT  is  only  approximately  shift-invariant  [3,4]. 

^Note  that  the  Fourier  transform  of  the  complex  scaling  function,  -|- j'I'g(a))  =  is  only  approximately  analytic  in 

practice,  and  so  its  support  will  leak  into  the  negative  frequency  region. 
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Figure  1:  The  1-D  dual-tree  CWT  is  implemented  using  a  pair  of  Biter  banks  operating  on  the  same  data  simultaneously. 
Outputs  of  the  Biter  banks  are  the  dual-tree  scaling  coefBcients,  Chp  and  Cg^,  and  the  wavelet  coefficients,  dhf  ^  and  dg^  at 
scale  t  and  shift  p.  The  CWT  coefBcients  are  then  obtained  as  dk.(  j,  +  j  dg^  ^ . 

Definition  1  [6]  Let  /  be  a  real-valued,  2-D  function.  The  complete  2-D  complex  analytic  signal  is  defined  in  the 
space  domain,  x  =  {x,y),  as  the  pair  of  complex  signals 

/ai(x)  =  [/(x)-/H*(x)]-hj[/«*i(x)-h/H*2(x)], 

/a2(x)  =  [/(x)-h/H^(x)]-hJ[/«^l(x)-/H^2(x)], 

where 


fmi  (x) 

=  /(x)  *  * 

7TX 

(4) 

/w*2(x) 

=  /(X).. 

(5) 

/w*(x) 

=  /(x).. 

TT^Xy 

(6) 

The  function  f-ui  is  the  total  HT;  the  functions  fuii  and  fui2  are  the  partial  HTs;  S{x)  and  d{y)  are  impulse  sheets 
along  the  y-axis  and  x-axis  respectively;  and  **  denotes  2-D  convolution. 

The  2-D  complex  analytic  signal  in  (2)-(3)  is  the  notion  behind  the  2-D  dual-tree  CWT  [3,4].  Each  2-D  CWT 
basis  function  is  a  2-D  complex  analytic  signal  consisting  of  a  standard  DWT  tensor  wavelet  plus  three  additional 
real  wavelets  obtained  by  1-D  HTs  along  either  or  both  coordinates.  For  example,  starting  from  real  DWT’s 
diagonal-subband  tensor  product  wavelet  /(x)  =  'iphix)'iph{y)  from  above  we  obtain  from  equations  (4)-(6)  its 
partial  and  total  HTs 

=  {i^g{x)tph{y),fph{x)ipg{y),ipg{x)tpg{y)). 

From  Definition  1,  we  then  obtain  the  two  complex  wavelets 

'>Pi{x,y)  =  {■ilJhix)')phiy)-tpg{x)2pg{y))-\-j{-tphix)'ipg{y)-\-iljg{x)iphiy)),  (J) 

'4’2i.x,y)  =  {'>ph{x)il)h{y)  +  '>Pg{x)i)g{y))+ j{il)h{x)'ipg{y)-'4)g{x)'ilJh{y)),  (8) 

having  orientations,  45°  and  —45°,  respectively.  Similar  expressions  can  be  obtained  for  the  other  two  subbands 
(±15°  and  ±75°)  based  on  tljh{x)(t)h{y)  and  (j)h{x)i^h{y) ■ 

Each  2-D  CWT  coefficient  has  only  a  single  phase  angle,  which  encodes  the  1-D  shift  of  image  features  perpen¬ 
dicular  to  its  subband  direction.  Figure  2(a)  illustrates  this  phase-shift  property.  This  encoding  may  be  sufficient  for 
local  1-D  structures  such  as  edges,  since  we  can  define  edge  shifts  uniquely  by  a  single  value,  say  r,  in  the  direction 
perpendicular  to  the  edge.  However,  even  in  this  case,  the  analysis  is  not  so  straightforward  when  the  edge  does  not 
align  with  the  six  orientations  of  the  CWT  subbands.  Moreover,  shifts  of  intrinsically  2-D  (non-edge)  image  features 
such  as  in  Figure  2(a)  require  two  values  (ri,r2)  in  the  x  and  y  directions,  respectively.  This  creates  ambiguity 
in  the  CWT  phase  shift.  We  can  resolve  this  ambiguity  by  using  the  coefficients  from  two  CWT  subbands,  but 
this  complicates  the  use  of  the  CWT  for  image  analysis,  modeling,  and  other  image  processing  applications.  In 
contrast.  Figure  2(b)  illustrates  a  more  convenient  encoding  of  image  shifts  in  absolute  a;,  y-coordinates  (with  two 
phase  angles)  using  the  quaternion  phases  of  our  new  QWT,  to  which  we  now  turn  our  attention. 


(2) 

(3) 
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(a) 


Figure  2:  (a)  The  CWT  coefficient’s  single  phase  angle  responds  linearly  to  image  shift  r  in  a  direction  orthogonal  to  the 
wavelet’s  orientation,  (b)  Two  of  the  QWT  coefficient’s  three  phase  angles  respond  linearly  to  image  shifts  (ri,r2)  in  an 
absolute  horizontal/vertical  coordinate  system. 


2.3  The  Quaternion  Wavelet  Transform 


2.3.1  Quaternion  Hilbert  transform 


There  are  several  alternatives  to  the  2-D  analytic  signal  of  Definition  1;  we  focus  here  on  one  due  to  Billow  [5]. 

It  combines  the  partial  and  total  HTs  from  (4)-(6)  to  form  an  analytic  signal  comprising  a  real  part  and  three 
imaginary  components  that  are  manipulated  using  quaternion  algebra  [7]. 

The  set  of  quaternions  El  =  {a  +  jib  +  j2C  +  j^d  |  a,  6,  c,  d  G  K}  has  multiplication  rules  jij2  =  — J2ji  =  ja  and 
=  j|  =  —  1  as  well  as  component- wise  addition  and  multiplication  by  real  numbers  [8].  Additional  multiplication 
rules  include:  jf  =  —1,  =  —jaj2  =  ji  and  j^ji  =  —jijs  =  j2-  Note  that  quaternionic  multiplication  is  not 

commutative.  The  conjugate  q*  of  a  quaternion  q  =  a  +  jib  +  j2C  +  j^d  is  defined  hy  q*  =  a  —  jib  —  j2C  —  jad  while 
the  magnitude  is  defined  as  jgl  =  \/qq* . 

An  alternative  representation  for  a  quaternion  is  through  its  magnitude  and  three  phase  angles:  q  =  \q\  ^3363^3202 

[7],  where  {9i,  62,  O3)  are  the  quaternion  phase  angles,  computed  using  the  following  formulae  (for  q  normalized,  i.e., 

Idl  =  1):  ^ 

6*3  =  --  arcsin(2(6c  —  ad)) ,  (9) 

and,  in  the  regular  case  (i.e.,  when  6*3  G  (  —  f,  f)), 


01 

02 


1 

-  arctan 
2 


/  2{bd  +  ac)  \ 
\a'^  +  b'^  —  —  d'^  )  ’ 


1 

-  arctan 
2 


/  2{cd  +  ab)  \ 
\  —  b'^  +  —  d"^  ) 


(10) 

(11) 


In  the  singular  case,  i.e.,  when  63  =  ±^,  9i  and  62  are  not  uniquely  defined.  Only  the  sum  (if  03  =  —  f )  or  the 
difference  (if  O3  =  |)  of  0i  and  02  is  unique  [7].  If  (0i,02,03)  calculated  from  (9)-(ll)  satisfy  =  —q, 

subtract  63  by  tt  if  63  >0;  add  tt  to  03  if  ^3  <  0.  As  a  result,  each  quaternion  phase  angle  is  uniquely  defined  within 
the  range  (0i,02,03)  e  [-7r,7r)  x  [-f ,  |)  x  [-f ,  f]. 

The  operation  of  conjugation  in  the  usual  set  of  complex  numbers,  C  =  a  -I-  jb,  where  a,  5  G  K  and  j^  =  —1,  is  a 
so-called  algebra  involution  that  fulfills  the  two  following  properties  for  any  z,w  G  C:  (z*)*  =  z  and  (wz)*  =  w* z* . 
In  H,  there  are  three  nontrivial  algebra  involutions: 


ai{q)  =  -jiqji  =a  +  jib  -  j2C  -  jad,  (12) 

a2{q)  =  -j2qj2  =  a-  jib  +  j2C  -  jad,  (13) 

o^aiq)  =  -jaqja  =  a-  jib  -  j2C  +  jsd.  (14) 

Using  these  involutions  we  can  extend  the  definition  of  Hermitian  symmetry.  A  function  /  :  ^  El  is  called 

quaternionic  Hermitian  if,  for  each  {x,y)  G  K^, 

f{-x,y)  =  a2if{x,y)),f{x,-y)  =  ai{f{x,y))  and  f{-x,-y)  =  a3{f{x,y)).  (15) 
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Blilow  introduces  an  alternative  definition  of  2-D  analytic  signal  based  on  the  quaternion  Fourier  transform 
(QFT)  [7].  The  QFT  of  a  2-D  signal  /  is  given  by 

^9{/}  =  F9(u)=  f  (16) 

where  iF‘^  denotes  the  QFT  operator,  u  =  (u,  v)  indexes  the  QFT  domain  and  the  quaternion  exponential 

g-ilSTTua;  g-iaZ-n-uy 

is  the  QFT  basis  function.  The  real  part  of  (17)  is  cos(27rMa;)  cos(27rr;y),  while  the  other  three  quaternionic  compo¬ 
nents  are  its  partial  and  total  HTs  as  defined  in  (4)-(6).  Note  that  the  QFT  of  a  real- valued  signal  is  quaternionic 
Hermitian,  and  each  QFT  basis  function  satisfies  the  definition  of  a  2-D  quaternion  analytic  signal. 

Definition  2  [5]  Let  /  be  a  real-valued  2-D  signal.  The  2-D  quaternion  analytic  signal  is  defined  as 

/IW  =  /W  +  ji/wiiW  +  J2/to2(x)  +  J3/to(x),  (18) 

where  the  functions  /wq,  fni2  ^nd  fui  are  defined  as  in  (4)-(6). 

2.3.2  QWT  construction 

Our  new  2-D  dual-tree  QWT  rests  on  the  quaternion  definition  of  2-D  analytic  signal.  By  organizing  the  four 
quadrature  components  of  a  2-D  wavelet  (the  real  wavelet  and  its  2-D  HTs)  as  a  quaternion,  we  obtain  a  2-D 
analytic  wavelet  and  its  associated  quaternion  wavelet  transform  (QWT).  For  example,  for  the  diagonal  subband, 
with  {f,fnii,fni2,fm)  =  {f’h{x)'iph{y),i>g{x)f^h{y),i’h{x)'tpg{y),'ipg{x)-tpg{y)),  we  obtain  the  quaternion  wavelet 

ip^{x,y)  =  tph{x)ifh{y)  +  jiipg{x)tph{y)  +  j2'ifh{x)'tpg{y)  +  j3ipg{x)tpg{y).  (19) 

To  compute  the  QWT  coefficients,  we  can  use  a  separable  2-D  implementation  [4]  of  the  dual-tree  filter  bank 
in  Figure  1.  During  each  stage  of  filtering,  we  independently  apply  the  two  sets  of  h  and  g  wavelet  filters  to 
each  dimension  {x  and  y)  of  a  2-D  image;  for  instance,  applying  the  set  of  filters  h  to  both  dimensions  yields 
the  scaling  coefficients  ChhL.p  ^nd  the  diagonal,  vertical,  and  horizontal  wavelet  coefficients, 
d^hi  respectively.  Therefore,  the  resulting  2-D  dual-tree  implementation  comprises  four  independent  filter  banks 
(corresponding  to  all  possible  combinations  of  wavelet  filters  applied  to  each  dimension  (hh,  hg,  gh,  and  gg)) 
operating  on  the  same  2-D  image.  We  combine  the  wavelet  coefficients  of  the  same  subband  from  the  output  of 
each  filter  bank  using  quaternion  algebra  to  obtain  the  QWT  coefficients;  for  example,  for  the  diagonal  subband: 
df,p  =  dhh^^  +  did^ht,p  +  hdhg^^  +  jsd^g^^. 

Figure  3(c)  illustrates  the  four  components  of  a  quaternion  wavelet  and  its  quaternion  magnitude  for  the  diagonal 
subband.  The  partial  and  total  HT  components  resemble  iph{x)'ijjfi{y)  but  are  phase-shifted  by  90°  in  the  horizontal, 
vertical,  and  both  directions.  The  magnitude  of  each  quaternion  wavelet  (square  root  of  the  sum-of-squares  of  all 
four  components)  is  a  smooth  bell-shaped  function.  We  can  also  interpret  the  four  components  of  tp^{x,y)  in  the 
QFT  domain  as  multiplying  the  quadrants  of  the  QFT  of  iphix)'tph{y)  by  ±ji  or  ±j2,  or  both,  as  shown  in  Figure  4. 
Note  that  the  order  of  multiplication  is  important  because  quaternion  multiplication  is  non-commutative.  This 
quaternion  wavelet,  fP{x,y),  has  support  in  only  a  single  quadrant  of  the  QFT  domain  . 

The  construction  and  properties  are  similar  for  the  other  two  subband  quaternion  wavelets  based  on  (j)h{x)'iph{y) 
and  'iljh{x)(j)h{y)  (see  the  horizontal  and  vertical  subband  wavelets,  ip^{x,y)  and  ip'^{x,y)  in  Figures  3(a)  and  (b), 
respectively).  In  summary,  in  contrast  with  the  six  complex  pairs  of  CWT  wavelets  (12  functions  in  total),  the 
QWT  sports  three  quaternion  sets  of  four  QWT  wavelets  (12  functions  in  total). 

Finally,  note  that  the  quaternion  wavelet  transform  is  approximately  a  windowed  quaternion  Fourier  transform 
(QFT)  [7].  In  contrast  to  the  QFT  in  (16),  the  basis  functions  for  the  QWT  are  scaled  and  shifted  versions  of 
the  quaternion  wavelets  (tphix)  +  jii!g{x)){4>h{y)  +  j2(t)g{y)),  {4'h{x)  +  ji(l>g{x)){'iph{y)  +  j2'ipg{y)),  and  (iphix)  + 
ji'>l’g{x)){ifh{y)  +  j2'tfg{y))- 

2.4  QWT  Applications 

In  this  section,  we  demonstrate  the  utility  of  the  QWT  with  two  applications  in  geometrical  edge  structure  and 
image  disparity  estimation. 
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Figure  3:  Three  quaternion  wavelets  from  the  2-D  dual-tree  QWT  frame.  Each  quaternion  wavelet  comprises  four  compo¬ 
nents  that  are  90°  phase  shifts  of  each  other  in  the  vertical,  horizontal,  and  both  directions,  (a)  Horizontal  subband,  from  left 
to  right:  <ph{x)^h{y)  (a  usual,  real  DWT  tensor  wavelet),  (/>h(x)i/)g(y),  (l>g{x)tph{y),  (l>g{x)^g{y),\tp^{x,y)\.  (b)  Vertical  sub¬ 
band,  from  left  to  right:  'iph{x)4>h{y)  (a  usual,  real  DWT  tensor  wavelet),  iph{x)4>g{y),  '4:g{x)4>h{y),  ipg{x)4>g{y),\'ip^ {x,  y)\.  (c) 
Diagonal  subband,  from  left  to  right:  iph(x)iph(y)  (a  usual,  real  DWT  tensor  wavelet),  tj}h{x)'4>g{y),  tpg{x)%l)h{y),  '4>g{x)%l)g{y), 
{x,y)\.  The  image  on  the  far  right  is  the  quaternion  wavelet  magnitude  for  each  subband,  a  non-oscillating  function. 


2.4.1  Edge  geometry  estimation 

Edges  are  the  fundamental  building  blocks  of  many  real-world  images.  Roughly  speaking,  a  typical  natural  image 
can  be  considered  as  a  piecewise  smooth  signal  in  2-D  that  contains  blocks  of  smooth  regions  separated  by  edge 
discontinuities  due  to  object  occlusions.  Here  we  use  the  QWT  magnitude  and  phase  to  extend  the  multiscale  edge 
geometry  analysis  of  [9, 10]. 

Theory 

Consider  an  image  /(x)  containing  a  dyadic  image  block  that  features  a  single  step  edge,  as  parameterized  in 
Figure  5(a).  Note  that  for  an  edge  oriented  at  angle  /3,  any  shift  (ri,r2)  in  the  {x,y)  directions  satisfying  the 
constraint 

ri  cos /3 -I- r2  sin /3  =  r  (20) 

is  identical  to  a  shift  from  the  center  of  the  block  by  r  in  the  direction  perpendicular  to  /3. 

Our  goal  in  this  section  is  to  analyze  the  phase  {01,62,03)  of  the  quaternion  wavelet  coefficient  (e.g.,  for 
the  vertical  subband)  corresponding  to  the  quaternion  wavelet  ^/^(x)  whose  support  aligns  with  the  dyadic  image 
block  containing  the  edge  (the  other  subbands  behave  similarly).  We  will  show  that  (01,6*2)  and  O3  provide  an 
accurate  means  with  which  to  estimate  the  edge  offset  r  and  edge  orientation  jd  respectively. 

First,  we  establish  a  linear  relationship  between  {61,02)  and  the  edge  offset  r  using  the  shift  theorem  of  [12].  The 
effective  center  frequency  {u,  v)  depends  on  both  the  image  QFT  spectral  content  F‘^{u)  and  the  center  frequency  of 
the  quaternion  wavelet,  and  it  always  lies  in  the  first  quadrant.  Since  the  spectral  energy  of  the  edge  QFT,  F'^(u), 
concentrates  along  two  impulse  ridges  through  the  origin  having  orientations  90°  —  jd  and  (d  —  90°  in  the  QFT 
domain  (see  Figure  5(b)  and  the  appendix  of  [12]),  we  can  write  {u,v)  =  c(|  cos/3|,  |  sin/3|),  where  c  is  a  positive 
constant  that  depends  on  (d,  the  subband,  and  the  scale  of  analysis  £.  When  the  edge  passing  through  the  image 
block  center  displaces  perpendicularly  by  r,  the  changes  in  phase  angles  (A0i,A02)  satisfy  ri  =  and  r2  = 
Plugging  {u,  v)  =  c(|  cos  (d\,  \  sin (d\)  and  using  (20),  we  obtain  the  concise  formula 


A01  ±  AO2 

2ttc 


(21) 
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Figure  4:  Quaternion  Fourier  domain  relationships  among  the  four  quadrature  components  of  a  quaternion  wavelet  ip^{x,  y) 
in  the  diagonal  subband.  The  QFT  spectra  of  the  real  wavelet  tph{x)tph{y)  in  the  first  to  fourth  quadrants  are  denoted 
by  (Fsj ,  Fsj ,  Fsg,  respectively.  The  partial  and  total  Flilbert  transform  operations  are  equivalent  to  multiplying  the 
quadrants  of  ixi  (a)  by  Tji,  or  ±j2,  or  both. 


(a)  single  edge  model  (wedgelet) 


V  / 

'  positive  quadrant 

(u, 

""  L 

leakage  leakage  quadrant 
(b)  edge  QFT  spectrum 


Figure  5:  (a)  Parameterization  of  a  single  edge  in  a  dyadic  image  block  (a  wedgelet  [11]).  (b)  QFT  spectrum  of  the  edge; 
shaded  squares  represent  the  quaternion  wavelets  in  the  horizonal,  vertical,  and  diagonal  subbands.  The  energy  of  the  edge 
is  concentrated  along  the  two  dark  lines  crossing  at  the  origin  and  is  captured  by  the  vertical  subband  with  effective  center 
frequency  at  quaternion  frequency  (u,v).  The  region  bounded  by  the  dashed  line  demonstrates  the  spectral  support  of  the 
QWT  basis  “leaking”  into  the  neighboring  quadrant. 


where  we  choose  A6i  +  Ad2  when  tan  (3  >  0,  and  A9i  —  A62  when  tan  /3  <  0.  We  have  verified  this  relationship  via 
experimental  analysis  of  straight  edges  in  detail  in  earlier  work  [13]. 

Based  on  the  interpretation  of  the  QWT  as  a  local  QFT,  we  use  an  inner  product  formula  to  analyze  the  behavior 
of  03  for  the  same  edge  block.  The  QWT  coefficient  can  be  computed  from  the  QFT  inner  product  between 
V'7p(^)  ^nd  the  QFT  of  the  edge  image  /(x)  (similarly  for  the  other  subbands).  Our  analysis  in  the  appendix 
of  [12]  states  that  if  the  quaternion  wavelet  is  perfectly  analytic,  then  regardless  of  (/3, r),  03  =  f  when  tan/3  >  0 
and  03  =  —f  when  tan/3  <  0.  Note  that  this  corresponds  to  the  singular  case  in  the  quaternion  phase  calculation 
in  Section  2.3.1. 

However,  practical  quaternion  wavelets  will  not  be  perfectly  analytic,  and  so  their  QFT  support  will  leak  into 
other  quadrants  of  the  QFT  domain  as  in  Figure  5(b).  This  necessitates  the  more  in-depth  analysis  (see  Appendix 
of  [12]),  which  shows  that  in  this  case 

03  =  ^  arcsin 

where  e  is  a  measure  of  the  ratio  of  local  signal  energy  in  the  positive  quadrant  to  the  energy  in  the  leakage  quadrant. 
For  the  vertical  subband  as  shown  in  Figure  5(b),  when  the  edge  orientation  /3  changes  from  0°  to  45°,  this  ratio  e 
changes  from  1  to  0  and  thus  O3  changes  from  0  to  We  model  this  behavior  of  03  in  the  horizontal  and  vertical 
subbands  to  design  an  edge  orientation  estimation  in  the  next  section.  Since  the  diagonal  subband  wavelet  has  QFT 
support  distant  from  the  leakage  quadrants,  this  QWT  subband  coefficients  are  almost  unaffected  by  leakage 
(i.e.,  e  «  0).  Their  corresponding  O3  approximately  equal  and  do  not  vary  with  /3. 

Practice 

Based  on  the  above  analysis,  we  propose  a  hybrid  algorithm  to  estimate  the  edge  geometry  (/3,  r)  based  on  the 
QWT  phase  (0i,  02,  ds)  and  the  magnitude  ratios  between  the  three  subbands.  We  generate  a  set  of  wedgelets  with 
known  /3  and  r  (see  Figure  5(a)  [11])  for  analysis  and  testing.  Our  algorithm  is  reminiscent  of  the  edge  estimation 
scheme  in  [10]. 


Figure  6:  Local  edge  geometry  estimation  using  the  QWT.  (a)  Several  edgy  regions  from  the  “cameraman”  image  are 
shown  on  the  left;  (b)-(e)  on  the  right  are  edge  estimates  from  the  corresponding  QWT  coefficients.  The  upper  row  shows 
the  original  image  region,  the  lower  row  shows  a  wedgelet  (see  Figure  5(a))  having  the  edge  parameter  estimates  (/?,  r).  (No 
attempt  is  made  to  capture  the  texture  within  the  block.) 

To  estimate  the  edge  orientation  P,  we  use  both  the  magnitude  ratios  among  the  three  subbands  and  63  of  the 
subband  with  the  largest  magnitude.  The  subband  with  the  largest  magnitude  gives  the  approximate  orientation  of 
the  edge  (±45°  for  diagonal,  ±15°  for  horizontal  and  ±75°  for  vertical);  the  sign  of  63  tells  whether  the  direction  is 
positive  or  negative.  We  experimentally  analyze  the  QWT  magnitude  ratios  and  63  of  the  set  of  generated  wedgelets 
corresponding  to  changing  edge  orientations  P  by  multiples  of  5°.  Using  standard  curve-fitting  techniques,  we 
develop  a  simple  relationship  between  these  parameters  and  P  for  our  orientation  estimation  scheme.  The  resulting 
orientation  estimation  algorithm  achieves  a  maximum  error  of  only  /3  ±  3°  in  practice  for  ideal  edges. 

To  estimate  the  offset  r  of  the  edge,  we  apply  the  relationship  between  (0i,02)  and  r  in  (21).  We  use  (01,6*2) 
from  either  the  horizontal  or  vertical  subband  (whichever  has  a  larger  magnitude) .  Depending  on  tan  /?,  we  compute 
the  sum  (or  difference)  of  the  change  in  phase  angles  A0i  ±  A02  for  the  edge  under  analysis,  using  r  =  0  as  the 
reference  edge.  Upon  analysis  of  the  simulated  wedgelets  with  known  r,  we  estimate  c  «  0.7,  to  be  used  in  (21).  Our 
final  edge  offset  estimation  algorithm  achieves  a  maximum  error  of  approximately  ±0.02  relative  to  the  normalized 
unit  edge  length  of  the  dyadic  block  (that  is,  sub-pixel  accuracy).  More  details  of  the  experimental  analysis  for 
the  wedgelet  model  can  be  found  in  [13].  According  to  (21),  within  a  27r-range  of  A0i  ±  A02,  the  range  of  r  is 
limited  to  an  interval  of  length  «  1.43  which  ensures  that  the  edge  stays  within  the  image  block  under  analysis. 
Therefore,  in  our  offset  estimation,  we  need  only  consider  one  27r-range  of  A0i  ±  A02  and  do  not  need  to  perform 
any  “phase-unwrapping” . 

Finally,  we  estimate  the  polarity  of  the  edge.  By  first  obtaining  an  offset  estimate  for  each  polarity  of  the  edge 
with  orientation  P  estimated  above,  namely  r'^  and  r~ ,  we  use  the  inner  product  between  the  image  block  and 
two  wedgelets  with  the  estimated  edge  parameters  (/?,  r"*")  and  (P,r~)  to  determine  the  correct  polarity.  Although 
our  calculation  of  estimation  accuracy  is  based  on  the  wedgelet  model,  our  algorithm  also  works  well  for  real-world 
images  such  as  the  popular  “cameraman”  image  in  Figure  6. 

Our  results  demonstrate  the  close  relationship  between  edge  geometry  and  QWT  magnitude  and  phases,  in 
particular,  the  encoding  of  edge  location  in  the  QWT  phases  (0i,02)  and  the  encoding  of  edge  orientation  in  the 
QWT  magnitude  and  third  phase  63. 

2.4.2  Image  disparity  estimation 

In  this  section,  as  another  example  of  QWT-based  data  processing,  we  present  an  algorithm  to  estimate  the  local 
disparities  between  the  target  image  A(x,y)  and  the  reference  image  B(x,y).  Disparity  estimation  is  the  process 
of  determining  the  local  translations  needed  to  align  different  regions  in  two  images,  that  is,  the  amount  of  2-D 
translation  required  to  move  a  local  region  of  a  target  image  centered  at  pixel  (x',y')  to  align  with  the  region  in 
a  reference  image  centered  at  the  same  location  (x',y').  This  problem  figures  prominently  in  a  range  of  image 
processing  and  computer  vision  tasks,  such  as  video  processing  to  estimate  motion  between  successive  frames, 
time-lapse  seismic  imaging  to  study  changes  in  a  reservoir  over  time,  medical  imaging  to  monitor  a  patient’s  body, 
super-resolution,  etc. 

Recall  that  the  QWT  phase  property  states  that  a  shift  (ri,r2)  in  an  image  changes  the  QWT  phase  from 
(01,02,03)  to  (01  —  27rMri,02  —  27n;r2,03).  Thus,  for  each  QWT  coefficient,  if  we  know  {u,v),  the  effective  center 
frequency  of  the  local  image  region  analyzed  by  the  corresponding  QWT  basis  functions,  then  we  can  estimate  the 
local  image  shifts  (ri,r2)  from  the  phase  differences. 
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However,  the  center  frequency  (u,  v)  is  image  dependent  in  general.  To  be  able  to  estimate  image  shifts  from 
QWT  phase  differences,  we  first  need  to  estimate  {u,v)  for  each  QWT  coefficient.  For  this  estimate,  we  can  again 
use  the  QWT  phase  properties.  If  we  know  the  image  shifts  and  measure  the  phase  difference,  then  we  can  compute 
{u,v). 

By  manually  translating  the  reference  image  A{x,  y)  by  known  small  amounts  both  horizontally  and  vertically, 
we  obtain  two  images  A{x,y)  and  A{x  —  ri,y  —  r2).  After  computing  the  QWTs  of  A(x,y)  and  A(x  —  ri, y  —  r2),  we 
can  use  the  phase  differences  (A6*i,  AO2)  between  the  QWT  coefficients  to  obtain  estimates  for  the  effective  spectral 
center  {u,v)  for  each  dyadic  block  across  all  scales  as  m  =  and  v  =  The  range  of  QWT  phase  angles 

limits  our  estimates  (u,u)  to  and  [—4^,  4^)  for  horizontal  and  vertical  shifts,  respectively,  where  R  is 

the  length  of  one  side  of  the  dyadic  block  corresponding  to  each  coefficient. 

Once  we  know  the  center  frequency  (u,v)  for  each  QWT  coefficient,  we  can  estimate  the  local  image  shifts  by 
measuring  the  difference  between  the  QWT  phase  corresponding  to  the  same  local  blocks  in  image  A(x,  y)  and 
B{x,y). 

A  key  challenge  in  phase-based  disparity  estimation  is  resolving  the  phase  wrap-around  effect  due  to  the  limited 
range  of  phase  angles.  Due  to  phase  wrapping,  each  observed  phase  difference  can  be  mapped  to  more  than  one 
disparity  estimate.  Specifically,  for  QWT  phase  differences  (A0i,  A02)  between  the  reference  and  target  images,  we 
can  express  the  possible  image  shifts  of  each  dyadic  block  as 


ri 


A6*i  -I-  7r(2n  -|-  k) 
2'ku 


r2  = 


A62  +  rriTT 
2nv 


(23) 


where  n,m  €  Z  and  k  G  {0, 1}.  Depending  on  m,  k  is  chosen  such  that  it  equals  0  when  m  is  even  and  equals  1 
when  m  is  odd.  The  special  wrap-around  effect  in  (23)  is  due  to  the  limited  range  in  9i  and  02  (to  [— tt,  tt)  and 
[-f ,  I )  respectively). 

In  our  multiscale  disparity  estimation  technique,  we  use  coarse  scale  shift  estimates  to  help  unwrap  the  phase  in 
the  finer  scales.  If  we  assume  that  the  true  image  shift  is  small  compared  to  the  size  of  dyadic  squares  at  the  coarsest 
scale  L,  then  we  can  set  m  =  n  =  fc  =  0in  (23)  at  this  scale  (no  phase  wrap-around)  and  obtain  correct  estimates 
for  ri  and  r2 .  Effectively,  this  assumption  of  no  phase  wrap-around  at  the  coarsest  scale  limits  the  maximum  image 
shift  that  we  can  estimate  correctly.  Once  we  have  shift  estimates  at  scale  L,  for  each  block  at  scale  £  =  L  —  1,  we 
estimate  the  shifts  as  follows: 


1.  interpolate  the  estimates  from  the  previous  scale(s)  to  obtain  predicted  estimates  (r4,rf), 

2.  substitute  (A0i,  A02)  into  (23)  and  determine  the  {n,k,m)  such  that  (ri,r2)  is  closest  to  (rf,r2), 

3.  remove  any  unreliable  (ri,r2), 

4.  repeat  Steps  1-3  for  the  finer  scales  £  =  L  —  2,  L  —  3, .. . 

Step  1  uses  either  nearest-neighbor  interpolation  (which  gives  higher  estimationaccuracy)  or  bilinear  interpola¬ 
tion  (which  results  in  a  smoother  disparity  field  for  better  visual  quality).  We  choose  the  latter  in  our  simulations. 
In  Step  3,  we  use  a  similar  reliability  measure  as  in  the  confidence  mask  [7]  to  threshold  unreliable  phase  and  offset 
estimates.  We  also  threshold  based  on  the  magnitude  of  the  QWT  coefficients.  We  iterate  the  above  process  until 
a  fine  enough  scale  (e.g.,  £  =  2),  since  estimates  typically  become  unreliable  at  this  scale  and  below.  The  QWT 
coefficients  for  the  small  dyadic  blocks  have  small  magnitudes,  and  so  their  phase  angles  are  very  sensitive  to  noise. 

We  can  improve  upon  the  basic  iterative  algorithm  by  fusing  estimates  across  subbands  and  scales.  First,  with 
proper  interpolation,  we  can  average  over  estimates  from  all  scales  containing  the  same  image  block.  Second,  we 
can  average  estimates  from  the  three  QWT  subbands  for  the  same  block  to  yield  more  accurate  estimates,  but  we 
need  to  discard  some  unreliable  subband  estimates  (for  example,  horizontal  disparity  ri  in  the  horizontal  subband 
and  r2  in  the  vertical  subband).  We  incorporate  these  subband/scale  averaging  steps  into  each  iteration  of  Steps 
1-4.3 

Figure  7  illustrates  the  result  of  our  QWT  phase-based  disparity  estimation  scheme  for  two  frames  from  the 
rotating  Rubik’s  cube  video  sequence  [7].  This  is  an  interesting  sequence,  because  a  rotation  cannot  be  captured 
by  a  single  global  translation  but  can  be  closely  approximated  by  local  translations.  The  arrows  indicate  both  the 
directions  and  magnitudes  of  the  local  shifts,  with  the  magnitudes  stretched  for  better  visibility.  We  can  clearly  see 
the  rotation  of  the  Rubik’s  cube  on  the  circular  stage,  with  larger  translations  closer  to  the  viewer  (bottom  of  the 
image)  and  smaller  translations  further  from  the  viewer  (top  of  the  image).  In  our  experiments,  we  obtained  the 
most  robust  estimations  by  averaging  over  both  scales  and  subbands. 

^Matlab  software  is  available  at  http://www.dsp.rice.edu/software/qwt.shtml. 
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(a)  reference  image  (b)  target  image 


Figure  7:  Multiscale  QWT  phase-based  disparity  estimation  results,  (a),  (b)  Reference  and  target  images  from  the  “Rubik’s 
cube”  image  sequence  [7].  (c)  Disparity  estimates  between  two  images  in  the  sequence,  shown  as  arrows  overlaid  on  top  of 
the  reference  image  (zoomed  in  for  better  visualization,  arrow  lengths  proportional  to  magnitude  of  disparity). 


(a)  reference  image  (heart 
during  contraction) 


(b)  coarse-scale  disparity 
estimates  (during  contraction) 


(c)  coarse-scale  disparity 
estimates  (during  relaxation) 


(d)  target  image  (heart 
during  contraction) 


(e)  fine-scale  disparity 
estimates  (during  contraction) 


estimates  (during  relaxation) 


Figure  8:  Multiscale  QWT  phase-based  disparity  estimation  results  for  the  “heart”  image  sequence,  (a),  (d)  Reference  and 
target  “heart  ’’images  from  two  frames  during  heart  contraction  (systole).  Estimation  results  show  (b)-(c)  coarse-scale  and 
(e)-(f)  fine-scale  detailed  motion  of  the  heart  and  blood  How  during  the  contraction  and  expansion  (systole  and  diastole) 
phases  of  the  cardiac  cycle,  illustrating  the  multiscale  nature  of  our  algorithm. 

Figure  8  demonstrates  the  multiscale  nature  of  our  disparity  estimation  approach  on  an  image  sequence  of  a 
living  heart  [14].  The  presence  of  sharp  edges  plus  the  smooth  cardiac  motion  in  these  images  is  well-matched  by 
our  wavelet-based  analysis.  Our  algorithm  averages  over  several  scales  £  to  obtain  motion  fields  in  various  levels 
of  detail.  Figure  8(b)  and  (c)  clearly  show  the  coarse-scale  contraction  and  relaxation  motions  of  the  heart,  while 
Figure  8(e)  and  (f)  display  more  detailed  movements  of  the  heart  muscles,  arteries  and  blood  flow  (in  particular, 
see  the  arrows  toward  left  region  of  the  heart  near  the  artery) . 

In  addition  to  visualizing  the  changes  from  one  image  to  another,  we  can  use  our  algorithm  as  the  feature 
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Figure  9:  Comparison  of  multiscale  QWT  phase-based  disparity  estimation  with  two  motion  estimation  algorithms,  Gra¬ 
dient  Correlation  (GC)  [15]  and  exhaustive  search  (ES).  The  performance  measure  is  PSNR  (in  dB)  between  the  motion- 
compensated  image  and  the  target  image  of  four  test  image  sequences  (“Rubik”,  “Heart”  and  “Taxi”),  (a)  Frame-by-frame 
PSNR  performance  comparison  in  the  “Rubik”  sequence,  (b)  Table  of  average  PSNR  performance  (over  all  frames)  for  each 
test  sequence.  The  multiscale  QWT  phase-based  method  demonstrates  the  best  performance  among  the  three  test  algorithms 
for  the  “Rubik”  sequence  and  shows  comparable  performance  to  the  other  algorithms  for  the  “Heart”  and  “Taxi”  sequences. 
Last  row  of  table  shows  the  computational  complexity  of  each  algorithm. 


matching  step  of  an  image  registration  algorithm  in  the  process,  using  the  disparity  information  to  build  a  warping 
function  to  align  the  images.  One  important  note  is  that  our  QWT-based  method  is  region-based  in  that  it  does 
not  require  any  detection  of  feature  points  or  other  image  landmarks.  Traditional  region-based  feature  matching 
methods,  which  estimate  the  spatial  correlation  (or  correspondence)  between  two  images,  include  block-matching  and 
Fourier  methods.  For  comparison,  we  compare  our  approach  to  the  exhaustive  search  (ES)  block  matching  technique, 
which  is  very  computationally  demanding  but  features  the  best  performance  among  all  general  block-matching 
techniques.  We  also  compare  to  a  Fourier  sub-pixel  motion  estimation  method  known  as  Gradient  Correlation 
(GC),  which  has  been  shown  to  have  better  PSNR  performance  than  other  recent  Fourier  methods  [15]. 

As  a  performance  measure,  we  use  the  Peak  Signal-to-Noise  Ratio  (PSNR)  between  the  motion  compensated 
image  C(x,y)  and  the  target  image  B(x,y),  which  is  given  by 


lOlogio 


(255)^N 

EJi?(x)-C'(x))2) 


(24) 


where  N  is  the  number  of  image  pixels.  The  motion  compensated  image  C(x,  y)  is  obtained  by  shifting  each  image 
block  in  the  reference  image  A(x,y)  according  to  the  estimated  motion  vectors.  Figure  9  compares  the  results  for 
four  image  sequences:  the  “Rubik”  and  “Taxi”  sequences  commonly  used  in  the  optical  flow  literature,  and  the 
“Heart”  sequence  from  Figure  8.  Figure  9(a)  demonstrates  the  superior  performance  of  our  QWT  phase-based 
algorithm  over  the  other  algorithms  for  the  “Rubik”  sequence,  which  has  piecewise-smooth  image  frames  and  a 
smooth  underlying  disparity  flow.  While  its  PSNR  performance  is  relatively  far  from  ES  for  the  “Heart”  sequence, 
we  note  that  the  QWT  phase-based  approach  provides  a  motion  field  that  is  more  useful  for  patient  monitoring 
and  diagnosis.  For  the  “Taxi”  and  “Mobcal”  sequences,  which  contain  discontinuities  in  their  underlying  flows,  the 
QWT  phase-based  algorithm  sports  comparable  performance  (see  the  table  in  Figure  9(b)).  Since  the  multiscale 
averaging  step  in  our  algorithm  tends  to  smooth  out  the  estimated  flow,  it  should  not  be  expected  to  perform  as 
well  for  discontinuous  motions  fields  of  rigid  objects  moving  past  each  other. 

Additional  advantages  of  our  QWT-based  algorithm  include  its  speed  (linear  computational  complexity)  and 
sub-pixel  estimation  accuracy.  For  an  7V-pixel  image,  our  0(N)  algorithm  is  more  efficient  than  the  0(N  log  N) 
FFT-based  GG  and  significantly  faster  than  ES,  which  can  take  up  to  0(N‘^)  with  the  search  parameter  on  the 
order  of  N .  General  block-matching  techniques  such  as  ES  can  only  decipher  disparities  in  an  integer  number  of 
pixels.  On  the  other  hand,  our  QWT-based  algorithm  can  achieve  sub-pixel  estimation  and  demonstrates  greater 
accuracy  for  the  “Rubik”  sequence  than  existing  phase-based  sub-pixel  estimation  methods  such  as  GG. 

Besides  gradient  correlation,  there  exist  other  phase-based  algorithms  for  disparity  estimation  and  image  reg¬ 
istration  [7, 16-20].  These  approaches  use  phase  as  a  feature  map  4)(x)  where  the  phase  function  4)  maps  2-D 
X,  ^-coordinates  to  phase  angles.  They  assume  the  phase  function  to  stay  constant  upon  a  shift  from  the  reference 
image  to  the  target  image;  that  is,  4)i(x)  =  <I)2(x-l-r)  where  <I)i  is  the  phase  function  for  the  reference  image  and  ^2 
for  the  target  image.  Then,  the  disparity  estimation  problem  is  simplified  to  calculating  the  optical  flow  for  these 
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phase  functions  [16,21].  In  contrast,  our  algorithm  is  entirely  based  on  the  multiscale  dual-tree  QWT  and  its  shift 
theorem. 

Our  approach  is  similar  to  Bayro-Corrochano’s  QWT  disparity  estimation  algorithm  in  [19]  in  its  use  of  quater¬ 
nion  phase  angles.  However,  the  latter  approach  requires  the  design  of  a  special  filter  to  compute  the  phase  derivative 
function  in  advance,  while  our  approach  need  only  estimate  the  local  frequencies  {u,v).  Our  implementation  also 
uses  a  dual-tree  filterbank,  as  compared  to  the  quaternion  wavelet  pyramid  of  Gabor  filters  in  [19].  Provided  a 
continuous  underlying  disparity  flow,  our  algorithm  yields  a  denser  and  more  accurate  disparity  map,  even  for 
smooth  regions  within  an  image.  Incorporating  an  optimization  procedure  such  as  in  [20]  or  a  statistical  model  into 
our  current  algorithm  can  further  improve  estimation  accuracy,  particularly  for  blocks  with  phase  singularity,  but 
requires  extra  computation  time. 

Kingsbury  et  al.  have  developed  a  multiscale  displacement  estimation  algorithm  based  on  the  2-D  CWT  [16, 17]. 
Their  approach  combines  information  from  all  six  CWT  subbands  in  an  optimization  framework  based  on  the 
optical  flow  assumptions.  In  addition  to  disparity  estimation,  it  simultaneously  registers  the  target  image  to  the 
reference  image.  In  comparison,  both  the  QWT  and  CWT  methods  are  multiscale  and  wavelet-based  and  are  thus, 
in  general,  best  for  smooth  underlying  disparity  flows.  However,  our  algorithm  is  much  simpler  and  easier  to  use 
because  it  does  not  involve  the  tuning  of  many  input  parameters  for  the  iterative  optimization  procedures  as  in 
the  CWT  algorithm.  While  our  method  estimates  local  disparities  without  warping  the  image,  we  can  apply  any 
standard  warping  procedure  to  easily  register  the  two  images  from  the  estimated  disparities.  Thanks  to  the  ability 
of  the  QWT  to  encode  location  information  in  absolute  horizontal/vertical  coordinates,  we  can  easily  combine  the 
QWT  subband  estimates  to  yield  more  accurate  flow  estimation  results.  Combining  subband  location  information 
in  the  2-D  CWT  is  more  complicated,  since  each  subband  encodes  the  disparities  by  complex  phase  angles  in  a 
reference  frame  rotated  from  other  subbands.  Based  on  our  experimental  results  and  comparison  of  the  design  of 
our  flow  estimation  algorithm  with  previous  approaches,  the  QWT  demonstrates  its  ability  to  efficiently  represent, 
encode  and  process  location  information  in  images. 

2.5  Conclusions  and  Future  Work 

We  have  introduced  a  new  2-D  multiscale  wavelet  representation,  the  dual-tree  QWT,  that  is  particularly  efficient 
for  coherent  processing  of  relative  location  information  in  images.  This  tight-frame  transform  generalizes  complex 
wavelets  to  higher  dimensions  and  inspires  new  processing  and  analysis  methods  for  wavelet  phase. 

Our  development  of  the  dual-tree  QWT  is  based  on  an  alternative  definition  of  the  2-D  HT  and  2-D  analytic 
signal  and  on  quaternion  algebra.  The  resulting  quaternion  wavelets  have  three  phase  angles;  two  of  them  encode 
phase  shifts  in  an  absolute  horizontal/vertical  coordinate  system  while  the  third  encodes  textural  information. 
The  QWT’s  approximate  shift  theorem  enables  efficient  and  easy-to-use  analysis  of  the  phase  behavior  around 
edge  regions.  We  have  developed  a  novel  multiscale  phase-based  disparity  estimation  scheme.  Through  efficient 
combination  of  disparity  estimates  across  scale  and  wavelet  subbands,  our  algorithm  clearly  demonstrates  the 
advantages  of  coherent  processing  in  this  new  QWT  domain.  Inherited  from  its  complex  counterpart,  the  QWT 
also  features  near  shift-invariance  and  linear  computational  complexity  through  its  dual-tree  implementation. 

Beyond  2-D,  the  generalization  of  the  Hilbert  transform  to  u-D  signals  using  hypercomplex  numbers  can  be 
used  to  develop  higher  dimensional  wavelet  transforms  suitable  for  signals  containing  low-dimensional  manifold 
structures  [22].  The  QWT  developed  here  could  play  an  interesting  role  in  the  analysis  of  (n  —  2)-D  manifold 
singularities  in  n-D  space.  This  efficient  hypercomplex  wavelet  representation  could  bring  us  new  ways  to  solve 
high-dimensional  signal  compression  and  processing  problems. 


3  The  Geometric  Hidden  Markov  Tree:  A  new  geometrical  statistical 
model 

3.1  Summary  of  results 

Traditional  wavelet-based  image  processing  algorithms  and  models  have  significant  shortcomings  in  their  treatment 
of  edge  contours.  The  standard  modeling  paradigm  exploits  the  fact  that  wavelet  coefficients  representing  smooth 
regions  in  images  tend  to  have  small  magnitude,  and  that  the  multiscale  nature  of  the  wavelet  transform  implies 
that  these  small  coefficients  will  persist  across  scale  (the  canonical  example  is  the  venerable  zero-tree  coder).  The 
edge  contours  in  the  image,  however,  cause  more  and  more  large  magnitude  wavelet  coefficients  as  we  move  down 
through  scale  to  finer  resolutions.  But  if  the  contours  are  smooth,  they  become  simple  as  we  zoom  in  on  them,  and 
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are  well  approximated  by  straight  lines  at  fine  scales.  Standard  wavelet  models  exploit  the  grayscale  regularity  of 
the  smooth  regions  of  the  image,  but  not  the  geometric  regularity  of  the  contours. 

We  have  built  a  model  that  accounts  for  this  geometric  regularity  by  capturing  the  dependencies  between 
complex  wavelet  coefficients  along  a  contour.  The  Geometric  Hidden  Markov  Tree  (GHMT)  assigns  each  wavelet 
coefficient  (or  spatial  cluster  of  wavelet  coefficients)  a  hidden  state  corresponding  to  a  linear  approximation  of  the 
local  contour  structure.  The  shift  and  rotational-invariance  properties  of  the  complex  wavelet  transform  allow  the 
GHMT  to  model  the  behavior  of  each  coefficient  given  the  presence  of  a  linear  edge  at  a  specified  orientation  — 
the  behavior  of  the  wavelet  coefficient  given  the  state.  By  connecting  the  states  together  in  a  quadtree,  the  GHMT 
ties  together  wavelet  coefficients  along  a  contour,  and  also  models  how  the  contour  itself  behaves  across  scale. 

3.2  Background  and  setup 

Images  have  two  salient  features  that  any  model  should  account  for:  they  contain  smooth,  homogeneous  regions, 
and  these  regions  are  separated  by  smooth  edge  contours.  That  is,  images  exhibit  grayscale  regularity  in  smooth 
regions  and  geometric  regularity  along  edge  contours.  Both  grayscale  regular  regions  and  geometric  regular  contours 
are  readily  characterized  by  their  multiscale  behavior.  In  a  sense,  they  both  become  “uninteresting”  as  we  zoom  in 
on  them;  at  fine  resolutions,  smooth  regions  are  essentially  flat  and  contours  are  essentially  straight. 

The  wavelet  transform  handles  the  grayscale  regularity  in  images  naturally,  and  as  a  result  has  emerged  as  the 
preeminent  tool  in  image  processing.  Wavelet  models  operate  under  the  paradigm  “wavelet  coefficients  representing 
smooth  regions  in  the  image  tend  to  be  small”.  This  idea  is  the  basis  for  many  state-of-the-art  wavelet  domain 
processing  algorithms  for  applications  including  compression  [23,24],  denoising  [25,26],  and  segmentation  [27]. 

An  effective  model  for  edge  structure  in  the  wavelet  domain  has  been  more  elusive.  At  fine  scales  in  the  wavelet 
domain,  many  coefficients  are  needed  to  build  up  an  edge  (the  number  roughly  doubles  from  scale  to  scale).  But 
since  the  edge  contour  itself  is  simple  (smooth),  the  values  of  the  wavelet  coefficients  are  “geometrically  coherent” 
in  that  they  exhibit  strong  dependencies,  a  fact  which  current  models  do  not  exploit. 

To  address  this  issue,  recent  research  in  applied  harmonic  analysis  has  focussed  on  developing  alternate  represen¬ 
tations  that  are  better  suited  for  edges  [28-32].  While  these  new  representations  have  theoretically  nice  properties, 
they  do  not  yet  enjoy  the  same  widespread  success  in  practical  applications  as  wavelets. 

Instead  of  proposing  an  alternate  representation,  we  explore  a  method  of  compensating  for  contour  regularity 
directly  in  the  wavelet  domain.  The  approach  combines  three  recent  developments  in  image  modeling:  complex 
wavelets  [33],  hidden  Markov  trees  [25],  and  multiscale  geometry  modeling  [34]. 

The  complex  wavelet  transform  (GWT)  [33,35],  like  the  real  wavelet  transform,  decomposes  an  image  using  basis 
functions  that  are  local  in  time  and  frequency.  Unlike  the  real  wavelet  transform,  complex  wavelets  are  also  local 
in  orientation  (see  Figure  10),  approximately  shift-invariant  and  approximately  steerable.  As  a  result,  geometrical 
coherency  among  a  group  of  complex  wavelet  coefficients  is  easy  to  identify  and  exploit. 

In  Section  3.4,  we  will  develop  the  geometric  hidden  Markov  tree  (GHMT),  a  statistical  model  for  images  in 
the  complex  wavelet  domain.  The  GHMT  assigns  a  hidden  state  to  each  group  of  six  complex  wavelet  coefficients 
that  corresponds  to  the  local  linear  edge  structure  present  in  the  image  (see  Figure  14).  The  presence  of  an  edge 
constrains  the  GWT  coefficients  to  behave  in  a  certain  manner,  just  as  they  are  “constrained”  to  be  small  when 
representing  smooth  regions. 

The  GHMT  captures  the  contour  structure  in  the  image  by  imposing  multiscale  dependencies  between  these 
hidden  states.  Since  each  contour  in  the  image  is  smooth,  their  local  linear  behavior  is  closely  related  across  scale 
(see  Figure  11).  These  relationships  are  quantified  with  transition  probabilities  between  the  hidden  states  of  parent 
and  child  GWT  coefficients  on  the  wavelet  quadtree.  An  appropriate  choice  of  transition  probabilities  allows  the 
GHMT  to  favor  hidden  state  configurations  that  are  geometrically  faithful;  at  fine  scales,  the  contour  is  essentially 
straight,  and  we  expect  little  innovation  in  the  hidden  state  sequence. 

In  Section  3.5,  we  apply  the  model  to  the  problem  of  feature  extraction.  Given  an  image,  we  use  a  fast  optimiza¬ 
tion  algorithm  to  find  the  state  configuration  with  greatest  joint  likelihood.  The  resulting  hidden  state  configuration 
gives  us  an  estimate  of  the  contour  structure  in  the  image  across  a  range  of  scales;  an  example  is  shown  in  Figure  15. 
The  pictures  in  Figure  15  are  completely  parameterized;  they  were  redrawn  from  the  “vector  graphics”  information 
(line  segment  orientations)  inherent  in  the  state  configuration. 
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Figure  10:  Real  parts  of  the  six  2D  wavelet  basis  functions  for  a  given  scale  and  location.  The  basis  functions  are  local  in 
space,  frequency,  and  direction;  each  responds  only  to  edges  at  certain  orientations. 


Figure  11:  The  linear  behavior  of  the  curve  on  the  right  is  closely  related  across  scale.  The  dashed  line  on  the  left  shows 
the  linear  fit  at  one  scale,  and  the  dotted  line  shows  the  fits  at  the  next  finer  scale. 


3.3  Complex  wavelets  and  local  geometry 

The  complex  wavelet  transform  decomposes  an  image  X(s),s  G  in  terms  of  basis  functions  that  are  shifted  in 
dilated  versions  of  six  different  mother  wavelets 

“Jo.fc  +  51  XI  51  (25) 

beBi>iofcez2 

where  =  2^'if;^{2^s  —  k).  The  and  V’j  ^(•s)  are  complex  valued,  with  the  real  and  imaginary  parts  of 

forming  an  approximate  Hilbert  pair.  The  CWT  uses  six  subbands  B  —  {15, 45, 75,-15,  —45,  —75},  labeled  for  the 
directional  information  they  give.  Like  the  real  wavelet  transform,  each  subband  can  be  arranged  on  a  quadtree  with 
nodes  indexed  by  scale  and  location.  The  real  parts  of  the  for  one  scale  and  location  are  shown  in  Figure  10. 
The  CWT  has  several  properties  which  make  it  attuned  to  characterizing  geometrical  structure^: 

Shift-invariance:  Let  {c^  be  the  CWT  coefficients  of  an  image  at  a  scale  j.  Then  the  corresponding  scale 
j  CWT  coefficients  for  any  shift  of  X  can  be  calculated  by  interpolating  the 

Rotational-invariance:  Similarly,  the  CWT  coefficients  for  any  rotation  of  X  can  be  calculated  by  interpolating 
the 

Directional  selectivity:  Along  with  being  local  in  time  and  frequency,  the  i/'j  k  local  in  orientation. 

As  a  result  of  these  three  properties,  the  CWT  coefficients  at  one  scale  and  location  can  provide  a  characterization 
of  local  geometrical  structure.  For  a  CWT  basis  function  centered  on  location  k  in  the  plane,  local  edge  positions 
can  be  parameterized  by  a  distance  d  and  an  angle  0  relative  to  k  (see  Figure  12).  The  {d,9)  parameterization 
provides  a  nice  separation  in  the  edge  response  of  the  six  CWT  coefficients  (see  Figure  13):  d  controls  the  phase, 
while  9  controls  the  magnitude.  The  mapping  from  (d,  9)  to  the  corresponding  edge  response  is  (almost)  invertible. 

■^While  not  strictly  true,  theses  properties  hold  for  nearly  all  intents  and  purposes. 
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(a) 


(b) 


Figure  12:  The  local  linear  behavior  of  a  contour  inside  the  bold  box  in  (a)  is  parameterized  by  (b)  an  angle  6  and  offset  d. 


Figure  13:  Magnitude  edge  response  for  the  six  CWT  basis  functions  at  a  given  scale  and  location  as  a  function  of  6  for 
d  =  0.  The  phase  response  varies  linearly  with  d.  The  relative  magnitudes  are  tied  to  the  angle  9  of  local  geometrical 
structure  while  the  phase  is  tied  to  the  offset  d. 


The  local  geometry  is  manifest  in  the  CWT  coefficients;  simply  by  looking  at  their  values,  we  can  not  only  decide 
whether  or  not  an  edge  is  in  the  vicinity,  but  we  can  also  estimate  the  distance  d  and  angle  9.  An  example  of  local 
edge  structure  being  “read  off”  from  CWT  coefficients  is  shown  in  Figure  6. 


3.4  Geometric  hidden  Markov  tree  modeling 


The  wavelet-domain  hidden  Markov  tree  [25]  (HMT)  is  a  general  purpose  statistical  image  model  underlying  powerful 
denoising  [26]  and  segmentation  [27]  algorithms.  The  HMT  separates  the  wavelet  coefficients  cj  of  an  image  into 
two  categories,  S  and  L,  and  uses  a  different  statistical  model  for  each.  Over  smooth  regions  in  the  image,  the 
wavelet  coefficients  are  expected  to  have  small  magnitude;  the  HMT  models  these  type  S  coefficients  as  coming 
from  a  low-variance  Gaussian  distribution.  Over  regions  with  edges  or  texture,  the  wavelet  coefficients  can  have 
larger  magnitude;  these  type  L  coefficients  are  modeled  with  a  high-variance  Gaussian.  Of  course,  the  type  of  each 
wavelet  coefficient  is  not  known  a  priori;  it  can  be  interpreted  as  a  hidden  state  that  controls  the  marginal 
distribution®  of  c® 


P{cq,kkj,k  =  m) 


7rCT„ 


■  rn  G  {S,  L}. 


(26) 


To  model  the  dependencies  between  the  wavelet  coefficients,  the  HMT  sets  up  a  Markov- 1  dependency  struc¬ 
ture  on  hidden  states  across  scale.  The  relationships  between  parent  and  child  coefficients®  are  parameterized  by 
transition  probability  matrix 


Amra  =  Prob  (child  state  =  mlparent  state  =  n) .  (27) 

^Typically,  the  variances  will  also  vary  with  the  scale  j 

®The  wavelet  coefficients  of  an  image  can  be  naturally  arranged  on  a  quadtree,  where  “parent  nodes”  at  a  scale  j  split  into  four 
“child”  nodes  at  scale  j  +  1. 
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Figure  14:  The  geometric  hidden  Markov  tree  model.  The  tree  of  hidden  states,  shown  on  the  left,  models  the  multiscale 
behavior  of  the  contours.  Associated  with  each  hidden  state  is  a  set  of  six  complex  wavelet  coefficients  (drawn  as  black  dots 
on  the  right). 


Since  smooth  regions  of  the  image  at  one  scale  will  break  down  into  smooth  regions  at  the  next  scale,  we  expect 
Ass  to  be  close  to  1. 

Rather  than  using  a  single  “Gaussian  with  large  variance”  state  to  model  contours  in  the  image,  the  geometric 
HMT  (GHMT)  breaks  the  L  state  down  into  many  geometry  states,  each  corresponding  to  a  different  type  of  local 
linear  contour  behavior.  The  states  are  drawn  from  a  dictionary  of  possible  {d,9)  values,  qj^k  G  S  U  {{dm,  9m)}, 
and  we  use  the  same  state  for  all  six  GWT  coefficients  {V'yfcjfc  for  a  given  scale  and  location  (j,  k). 

As  discussed  in  Section  3.3,  each  type  of  local  linear  contour  structure  produces  distinct  behavior  in  the  GWT 
coefficients.  Given  a  geometrical  state  qj^k  =  {dm, 9m),  we  model  the  six  GWT  coefficients  as  coming  from  an 
improper  Gaussian  whose  “mean”  is  the  one  dimensional  subspace  of  response  values  of  for  edges 

of  all  different  heights  at  {dm,  9m)-  We  have 

p  =  (dm,  9m))  OC  exp  dist({c5,fe}&,  V/(2CTg))  (28) 

where  dist({cj  is  the  distance  of  the  GWT  coefficients  {cj  to  the  subspace  and  controls 

the  variance  around  this  subspace. 

The  transition  probabilities  will  favor  small  innovations  in  geometry  over  large.  To  each  state  {dm,  9m),  there 
corresponds  a  line  £m  (as  in  Figure  12(b)).  The  Amn  are  assigned  based  on  the  Hausdorff  distance  AD{£m,£n) 
between  lines  £m  and  in  restricted  to  a  square  in  the  plane.  A  small  Hausdorff  distance  corresponds  to  little  change 
in  geometry  between  parent  and  child,  and  results  in  a  high  probability.  We  use  Amn  oc  e“ 

The  GHMT,  illustrated  in  Figure  14,  is  a  general  purpose  statistical  model  for  the  complex  wavelet  coefficients 
of  images,  and  thus  can  be  applied  to  a  range  of  image  processing  problems.  In  the  next  section,  we  present  a 
simple  feature  extraction  algorithm. 

3.5  Application:  Feature  extraction 

The  hidden  states  in  the  GHMT  are  more  than  just  a  modeling  device;  they  correspond  to  semantic  contour 
information.  Since  edge  contours  are  perceptually  the  most  important  features  in  an  image,  knowledge  of  the 
underlying  states  is  interesting  in  its  own  right. 

Given  an  image,  the  Markov-1  quadtree  structure  of  the  GHMT  allows  us  to  find  the  joint  maximum-likelihood 
hidden  state  sequence  {qj^k}  using  the  Viterbi  algorithm  [36].  The  result  is  a  set  of  {d,9)  estimates  at  every  node 
tied  together  by  the  geometry  model.  The  collection  of  estimate  states  {qj}k}k  at  a  scale  j  can  be  used  to  “sketch” 
the  contour  structure  in  an  image;  the  finer  the  scale,  the  more  details  are  included. 

An  example  of  geometrical  feature  extraction  using  GHMT  state  estimation  is  shown  in  Figure  15.  The  contour 
sketches  are  re-created  from  the  {d,  9)  values  inferred  by  the  Viterbi  algorithm.  The  estimated  hidden  state  values 
are  equivalent  to  a  vector  graphics  representation  of  the  contour  sketch,  and  can  be  used  to  form  an  extremely  low 
bitrate  semantic  approximation  to  the  image. 

3.6  Conclusions  and  Future  Work 

Despite  their  success,  traditional  wavelet-based  models  have  significant  shortcomings  in  their  treatment  of  edge 
structure  in  images.  We  have  introduced  a  modeling  framework  that  accounts  for  geometrically  smooth  edge 

^We  could  make  the  distribution  in  (28)  proper  by  putting  a  prior  on  the  heights  of  the  edges  in  the  image. 
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Figure  15:  Contour  sketches  at  two  different  scales  of  the  “cameraman”  image  shown  in  Figure  6.  The  sketches  were  created 
directly  from  the  (d,  9)  values  of  the  geometrical  states  estimated  by  the  Viterbi  algorithm. 


structure.  The  geometrical  HMT  models  spatial  clusters  of  complex  wavelet  coefficients  according  to  an  underlying 
hidden  state  specifying  a  linear  approximation  of  the  local  contour  structure.  By  imposing  dependencies  on  these 
hidden  states,  we  are  able  incorporate  a  characterization  of  geometrical  structure  into  the  model.  Future  areas  of 
work  include  the  application  of  the  model  to  other  signal  processing  applications,  such  as  compression  and  denoising. 
Another  fruitful  area  of  work  lies  in  the  combination  of  this  model  with  other  geometric  models,  and  the  pursuit  of 
new  harmonic  bases  motivated  by  geometry. 


4  Surflet  representation  and  compression  of  higher  dimensional  func¬ 
tions  containing  smooth  discontinuities 

4.1  Summary  of  results 

Discontinuities  are  prevalent  in  data,  and  often  represent  the  key  information  of  interest.  Efficient  representations 
for  such  discontinuities  are  important  for  many  potential  signal  processing  applications,  including  compression,  but 
standard  Fourier  and  wavelet  representations  fail  to  efficiently  capture  the  structure  of  the  discontinuities.  These 
problems  are  most  notable  in  image  processing,  where  progress  has  been  made  in  modeling  and  representing  one¬ 
dimensional  edge  discontinuities  along  curves.  Little  work,  however,  has  been  done  on  efficient  representations 
for  higher  dimensional  functions,  or  for  handling  higher  orders  of  smoothness  in  discontinuities.  We  consider  a  class 
of  TV-dimensional  Horizon  functions  containing  a  smooth  singularity  in  —  1  dimensions,  which  serves  as  a 
boundary  between  two  constant  regions.  We  introduce  the  surflet  representation  for  approximation  and  compression 
of  functions  in  this  class.  Surflets  allow  a  multiscale,  piecewise  polynomial  approximation  of  the  discontinuity.  We 
propose  a  compression  algorithm  using  surflets  that  comes  within  a  logarithmic  factor  of  the  optimal  rate-distortion 
performance  for  this  function  class.  Equally  important,  this  algorithm  can  be  implemented  using  knowledge  of  the 
A-dimensional  function,  without  explicitly  estimating  the  (A  —  l)-dimensional  discontinuity. 

4.2  Problem  definition  and  setnp 

We  consider  functions  of  A  variables  that  contain  a  smooth  discontinuity  which  is  a  function  of  A  —  1  variables. 
We  denote  vectors  using  boldface  characters.  Let  x  €  [0, 1]^,  and  let  Xi  denote  the  Fth  element  of  the  vector  x. 
We  denote  the  first  A  —  1  elements  of  x  by  y,  i.e.,  y  =  [xi,X2,  -  ■  ■  ,xm-i]  G  [0, 

4.2.1  Smoothness  model  for  discontinuities 

We  first  define  the  notion  of  smoothness  that  is  used  for  modeling  discontinuities.  A  function  of  A  —  1  variables  is 
said  to  have  smoothness  of  order  A  >  0,  where  A  =  r  -|-  a,  r  is  an  integer,  and  0  <  a  <  1  if  the  following  criteria 
are  met  [37, 38]: 

•  All  iterated  partial  derivatives  with  respect  to  the  A  —  1  directions  up  to  order  r  exist  and  are  continuous. 
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Figure  16:  Example  Horizon-class  functions  for  N  =  2  and  N  =  3. 


•  All  such  partial  derivatives  of  order  r  satisfy  a  Lipschitz  condition  of  order  a  (also  known  as  a  Holder 
condition).® 

We  denote  the  space  of  such  functions  by  C^.  Observe  that  when  K  is  an  integer,  includes  as  a  subset  the 
traditional  space  (where  the  function  has  K  =  r  -h  1  continuous  partial  derivatives). 


4.2.2  Multidimensional  Horizon-class  functions 

Let  5  be  a  function  of  —  1  variables  in  such  that 


5:  [0,1]^-!^  [0,1]. 


We  define  the  function  f  of  N  variables  such  that 

/:  [0,1]^  ^{0,1} 


according  to  the  following: 


/(x) 


1,  XN  >  b{y) 

0,  XN  <  b{y). 


The  function  /  is  known  as  a  Horizon-class  function  [39],  where  the  smooth  function  b  defines  a  horizon  between 
values  0  and  1. 

As  shown  in  Figure  16,  when  N  =  2  such  a  function  can  be  interpreted  as  an  image  containing  a  smooth 
discontinuity  that  separates  a  black  region  (0-valued)  below  from  a  white  region  (1-valued)  above.  For  N  =  3,  f 
represents  a  cube  with  a  two-dimensional  smooth  surface  cutting  across  the  cube  dividing  it  into  two  regions  — 
black  below  the  surface,  and  white  above  it. 


4.2.3  Problem  formulation 

Our  goal  is  to  encode  /,  a  Horizon-class  function  of  N  variables,  and  design  a  compression  algorithm  that  accom¬ 
plishes  this  with  optimal  asymptotic  rate-distortion  behavior.  We  use  the  squared-L2  distortion  metric  to  measure 
distortion  between  /  and  fp,  the  approximation  provided  by  the  compression  algorithm  using  R  bits: 

D2{f,fR)=  f  if-hf- 

Jxe[o.i]« 

We  emphasize  that  our  algorithm  approximates  f  in  N  dimensions.  The  approximation  fp,  however,  is  Horizon-class 
—  our  algorithm  implicitly  provides  an  approximation  bp  to  the  smooth  discontinuity  b.  That  is, 

function  g  G  Lip  a.  if  \g(y  -|-  h)  —  g(y)|  <  C|h|“  for  all  y,  h. 


XN  >  bniy) 
XN  <  bp{y) 
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for  some  bn.  From  the  definition  of  A^-dimensional  Horizon-class  functions,  it  is  easy  to  see  that 

^2(/,/fl)  =  f  {f-hf 

Jxe[o,i]« 

=  [  \b-bn\ 

Jye[o.i]«-i 

=  D,{b,bn).  (29) 

Hence,  optimizing  for  squared-L2  distortion  between  /  and  fn  is  equivalent  to  optimizing  for  Li  distortion  between 
b  and  bn- 

4.2.4  Compression  strategies  and  asymptotic  performance 

We  assume  that  the  coder  is  provided  explicitly  with  the  function  /.  As  can  be  seen  from  the  above  formulation,  all 
of  the  critical  information  about  the  function  /  is  contained  in  the  discontinuity  b.  One  would  expect  any  efficient 
coder  to  exploit  such  a  fact.  Methods  through  which  this  is  achieved  may  vary. 

One  can  imagine  a  coder  that  explicitly  encodes  b.  Knowledge  of  b  could  be  provided  from  an  external  “ora¬ 
cle”  [30],  or  b  could  conceivably  be  estimated  from  the  provided  data  /.  Wavelets  provide  one  efficient  method  for 
compressing  the  smooth  function  b.  Cohen  et  al.  [40]  describe  a  tree-structured  wavelet  coder  that  can  be  used  to 
compress  b  with  rate-distortion  performance 


Di{b,bn) 


The  work  of  Clements  [38]  (extending  Kolmogorov  and  Tihomirov  [37])  establishes  that  this  rate-distortion  per¬ 
formance  matches  the  metric  entropy  of  the  function  class  for  b.  We  have  also  proved  that  no  coder  for  /  can 
outperform  such  a  Horizon-class  coder.  Thus,  this  wavelet  coder  is  optimal  in  terms  of  asymptotic  rate-distortion 
performance  on  coding  instances  of  /. 

In  practice,  however,  a  coder  is  not  provided  with  explicit  information  of  b,  and  a  method  for  estimating  b  from 
/  may  be  difficult  to  implement.  Estimates  for  b  may  also  be  quite  sensitive  to  noise  in  the  data.  We  propose 
a  compression  algorithm  that  operates  directly  on  the  fV-dimensional  data  in  /.  This  algorithm  assembles  an 
approximation  fn  that  is  Horizon-class  (that  is,  it  can  be  assembled  using  an  estimate  bn),  but  it  does  not  require 
explicit  knowledge  of  b.  We  have  proved  that  this  algorithm  achieves  the  following  asymptotic  rate-distortion 
performance 

K 


D2{f,fR) 


and  our  current  work  strongly  indicates  that  the  (log  R)  factor  can  be  removed  to  provide  optimal  rate-distortion 
performance.  Our  algorithm  may  also  be  more  easily  extended  to  similar  function  spaces  containing  smooth  dis¬ 
continuities.  Our  spatially  localized  approach,  for  example,  allows  for  changes  in  the  “transient  variable”  (assumed 
throughout  to  be  xn). 


4.3  The  surflet  dictionary 

In  this  section,  we  define  a  discrete  dictionary  of  fV-dimensional  atoms,  called  surflets,  which  can  be  used  to 
construct  approximations  to  the  Horizon-class  function  /.  Each  surflet  consists  of  a  dyadic  hypercube  containing 
a  Horizon-class  function,  with  a  discontinuity  defined  by  a  smooth  polynomial.  Section  4.4  describes  a  specific 
compression  algorithm  for  surflet  approximations. 
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(a)  (b)  (c)  (d) 


Figure  17:  Example  surQets  with  N  =  2,  designed  for  smoothness  (a)  K  G  (1,2],  (b)  K  G  (2,3],  (c)  K  G  (3,4].  (d)  Tiled 
surhet  approximation. 


4.3.1  Motivation  —  Taylor’s  theorem 

Construction  of  the  surflet  dictionary  is  motivated  by  the  following  property.  If  6  is  a  function  of  —  1  variables 
in  C^,  Taylor’s  theorem  states  that 

N-l 

b(y  +  h)  =  b(y)  +  jy  XI  (y)  ' 

I  N-t 

+  ^  X!  (y)  ■  ^*1^*2  +  ■  ■  ■ 

il,*2  =  l 

1  w-1 

^  X/ 

+  OdIhliX,  (30) 

where  by^...yg^  refers  to  the  iterated  partial  derivatives  of  b  with  respect  to  yi,. . .  ,yi  in  that  order.  Note  that  there 
are  {N  —  1)^  £’th  order  derivative  terms. 

Thus,  over  a  small  domain,  the  function  b  is  well  approximated  using  an  r’th  order  polynomial  (where  the 
polynomial  coefficients  correspond  to  the  partial  derivatives  of  b  evaluated  at  y).  Clearly,  then,  one  method  of 
approximating  &  on  a  larger  domain  would  be  to  assemble  a  piecewise  polynomial  approximation,  where  each 
polynomial  is  derived  from  the  local  Taylor  approximation  of  b.  Consequently,  these  piecewise  polynomials  may  be 
used  to  assemble  a  Horizon-class  approximation  of  the  function  /.  Surflets  provide  the  TV-dimensional  framework 
for  constructing  such  approximations  and  can  be  implemented  without  explicit  knowledge  of  b  or  its  derivatives. 


4.3.2  Definition 

A  dyadic  hypercube  X  C  [0, 1]^  at  scale  j  G  TV  is  a  domain  that  satisfies 

A  =  (/?l2-^  (/3i  -t  1)2-^)  X  •  •  •  X  ]/?iv2-C  {Pn  +  1)2-^')  (31) 


with  Pi,  P2,  ■  ■  ■ ,  Pn  G  {0, 1,  •  •  ■ ,  2-^  —  !}•  We  explicitly  denote  the  (TV  —  l)-dimensional  hypercube  subdomain  of  X 
as 


F  =  [Pi2-P  {pi  +  1)2-^)  X  •  •  •  X  [PN-i2-p  {pN-i  +  1)2"^). 


(32) 


The  surflet  s(A;  •)  is  a  Horizon-class  function  over  the  dyadic  hypercube  X. 
y  =  [xi,X2,  •  ■  ■  ,  xat-i],  we  have 


s(X;x) 


1,  XN  >  p(y) 
0,  otherwise. 


For  X  G  A  with  corresponding 


where  the  polynomial  p(y)  is  defined  as  follows 


N-l  N-l 

p(y)  =  So  +  X  +  X  H - 

ii—1  —  l 

N-l 

+  y  ]  Sr,i^^i2,...,ij,yiiyi2  '  '  '  Vir- 

,•••  ,ir  —  l 
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Figure  18:  Example  surHets  with  N  =  3,  designed  for  smoothness  (a)  K  £  (1,  2],  (b)  K  £  (2,  3]. 


The  polynomial  coefficients  known  as  the  surflet  coefficients.^  We  note  here  that,  in  some  cases, 

a  surflet  may  be  identically  0  or  1  over  the  entire  domain  X. 

A  surflet  s(Ai)  approximates  the  function  /  over  the  dyadic  hypercube  X.  One  can  cover  the  entire  domain  [0, 1]'^ 
with  a  collection  of  dyadic  hypercubes  (possibly  at  different  scales),  and  use  surffets  to  approximate  /  over  each 
of  these  smaller  domains.  For  N  =  3,  these  surffets  together  look  like  piecewise  smooth  “surfaces”  approximating 
the  function  /.  Therefore,  we  coin  the  term  surffets  to  suggest  “surface” -lets.  Figure  17  illustrates  a  collection  of 
surffets  with  N  =  2  and  shows  an  approximation  obtained  by  combining  localized  surffets.  Figure  18  illustrates  a 
pair  of  surffets  with  N  =  3. 

4.3.3  Discretization 

We  obtain  a  discrete  surflet  dictionary  by  quantizing  the  set  of  allowable  surffet  coefficients.  For  £  £  {0, 1, . . .  ,r}, 
the  surffet  coefficient  at  scale  j  G  N  is  restricted  to  values  {n  •  Aij}n^z,  where  the  stepsize 

(33) 

The  necessary  range  for  n  may  depend  on  the  function  b.  However,  all  derivatives  are  locally  bounded,  and  so  the 
relevant  discrete  surffet  dictionary  is  actually  finite  for  any  realization  of  /. 

These  quantization  stepsizes  are  carefully  chosen  to  ensure  the  proper  fidelity  of  surffet  approximations  without 
requiring  excess  bitrate.  The  key  idea  is  that  higher-order  terms  may  he  quantized  with  lesser  precision,  without 
increasing  the  residual  error  term  in  the  Taylor  approximation  (30).  In  fact,  Kolmogorov  and  Tihomirov  [37] 
implicitly  used  this  concept  to  establish  the  metric  entropy  for  the  class  . 

4.4  Compression  using  surflets 

4.4.1  Overview 

Using  surffets,  we  propose  a  tree-based  multiresolution  approach  to  approximate  and  encode  /.  The  approximation 
is  arranged  on  a  2^-tree,  where  each  node  in  the  tree  at  scale  j  represents  a  hypercube  of  sidelength  2“1 .  Every  node 
is  either  a  leaf  node  (hypercube),  or  has  2^  children  nodes  (children  hypercubes  that  perfectly  tile  the  volume  of 
the  parent  hypercube).  Each  node  in  the  tree  is  labeled  with  a  surffet.  Leaf  nodes  provide  the  actual  approximation 
to  the  function  /,  while  interior  nodes  are  useful  for  predicting  and  encoding  their  descendants.  This  framework 
allows  for  adaptive  approximation  of  /  —  many  small  surffets  may  be  used  at  fine  scales  for  more  complicated 
regions,  while  few  large  surffets  will  suffice  to  encode  simple  regions  of  /  (such  as  those  containing  all  0  or  1). 

Section  4.4.2  discusses  techniques  for  determining  the  proper  surffet  at  each  node.  Section  4.4.3  mentions  a 
method  for  pruning  the  tree  depth  according  to  the  function  /.  Section  4.4.4  describes  the  performance  of  a  simple 
surffet  encoder  acting  only  on  the  leaf  nodes.  Section  4.4.5  presents  a  more  advanced  surffet  coder,  using  a  top-down 
predictive  technique  to  exploit  the  correlation  among  surffet  coefficients. 

4.4.2  Surflet  Selection 

Consider  a  node  at  scale  j  that  corresponds  to  a  dyadic  hypercube  X,  and  let  Y  be  the  {N  —  l)-dimensional 
subdomain  of  A  as  defined  in  (32). 

^Because  the  ordering  of  terms  j/i ^  •  •  ■  yi^  within  a  monomial  is  not  important,  only  (^”*'^~^)  monomial  coefficients  (not  (N  —  1)^) 

need  to  be  encoded  for  order  t.  We  preserve  the  slightly  redundant  notation  for  ease  of  comparison  with  (30). 
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In  a  situation  where  the  coder  is  provided  with  explicit  information  about  the  discontinuity  b  and  its  derivatives, 
determination  of  the  surflet  at  this  node  may  proceed  as  implied  in  Section  4.3.  Specifically,  the  coder  may  construct 
the  Taylor  expansion  of  b  about  any  point  y  €  Y  and  quantize  the  polynomial  coefficients  according  to  (33) .  To  be 
precise,  we  choose 

y=  [/3l2-^/322-^...,/3^,_l2-^] 

and  call  this  a  characteristic  point.  We  refer  to  the  resulting  surflet  as  the  quantized  Taylor  surflet.  Let  p  denote 
the  Taylor  polynomial  centered  at  y.  From  (30)  we  have 

max|6(y')  -p(y')|  =  0(2"^^). 

y'eY 

If  we  let  p  denote  the  polynomial  with  coefficients  quantized  according  to  (33),  it  can  be  proved  that 

max|6(y')  -p{y')\  =  0(2"^*^). 

y'eY 

It  follows  that  the  squared-L2  error  of  the  quantized  Taylor  surflet  approximation  of  /  obeys 

D2{f,  s{X))  =  I  (/(x)  -  s(X;  x))"  =  0(2-^('"+^-i)).  (34) 

Jxex 

As  discussed  in  Section  4.2.4,  a  coder  is  not  provided  with  explicit  information  of  b.  It  is  therefore  important 
to  mention  a  technique  that  can  obtain  a  surflet  estimate  directly  from  the  data  /.  We  assume  that  there  exists  a 
technique  to  compute  the  squared-L2  error  between  a  given  surflet  s{X)  and  the  function  /  on  the  dyadic  block, 

D2{f,s{X))=  [  (/(x)-s(A;x))2. 

Jxex 

In  such  a  case,  we  may  search  the  finite  surflet  dictionary  for  the  minimizer  of  this  error.  We  refer  to  the  resulting 
surflet  as  the  L2-best  surflet.  This  surflet  will  necessarily  obey  (34)  as  well.  Sections  4.4.4  and  4.4.5  discuss  the 
coding  implications  of  using  each  type  of  surflet. 

4.4.3  Organization  of  Surflet  Trees 

Given  a  method  for  assigning  a  surflet  to  each  node,  it  is  also  necessary  to  determine  the  proper  dyadic  segmentation 
for  the  tree  approximation.  This  may  be  accomplished  using  the  CART  (or  Viterbi)  algorithm  in  a  process  known 
as  tree-pruning  [39,41].  Tree-pruning  acts  in  a  bottom-up  procedure,  determining  whether  to  prune  the  tree  beneath 
each  node  (leaving  it  as  a  leaf  node).  Various  criteria  exist  for  making  such  a  decision.  In  particular,  the  rate- 
distortion  optimal  segmentation  may  be  obtained  by  minimizing  the  Lagrangian  rate-distortion  cost  D  -\-  XR  for  a 
penalty  term  A. 


4.4.4  Leaf  Encoding 


An  initial  approach  toward  a  surflet  coder  would  encode  a  tree  segmentation  map  denoting  the  location  of  leaf  nodes, 
along  with  the  quantized  surflet  coefficients  at  each  leaf  node.  We  have  proved  that,  using  either  the  quantized 
Taylor  surflets  or  the  L2-best  surflets,  the  asymptotic  rate-distortion  performance  of  this  coder  obeys 


D2{f,  Jr) 


logi?A 


Thus  this  simple  coder  is  near-optimal  in  terms  of  rate-distortion  performance.  In  addition,  it  can  be  implemented 
using  L2“best  surflets,  which  are  derived  directly  from  the  function  /. 


4.4.5  Top-down  Predictive  Encoding 

Achieving  the  optimal  rate-distortion  performance  requires  a  slightly  more  sophisticated  coder.  In  particular,  the 
above  encoding  technique  fails  to  exploit  the  correlation  among  nearby  surflets.  In  this  section,  we  briefly  describe 
a  top-down  surflet  coder  that  achieves  the  optimal  performance  by  predicting  surflet  parameters  from  previously 
encoded  values. 

Suppose  we  have  a  tree  segmentation,  where  each  node  is  labeled  with  a  quantized  Taylor  surflet. We  encode 
the  entire  tree  starting  with  the  root  node,  and  proceeding  from  the  top  down.  Given  a  quantized  surflet  s{Xj)  at 
an  interior  node  at  scale  j,  we  may  encode  its  children  surflets  (scale  j  -I-  1)  according  to  the  following  procedure. 

later  address  the  related  procedure  using  L2-best  surflets. 
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1)  (Parent-child  prediction).  Let  Yj  be  the  subdomain  of  Xj,  and  let  Yj+i  C  Yj  be  the  single  dyadic  hypercube  at 

scale  j  -I-  1  that  shares  the  same  characteristic  point  with  Yj.  Thus,  for  each  surflet  s(Xj_|_i)  with  subdomain 
on  Yj+i,  every  coefficient  of  s(Xj_|_i)  is  also  a  surflet  coefficient  of  (the  previously  encoded)  s{Xj),  but  more 
precision  must  be  provided  to  achieve  (33).  The  coder  should  provide  the  necessary  bits. 

2)  (Child-neighbor  prediction).  We  now  use  surflets  encoded  on  ij+i  (in  Step  1)  to  predict  the  surflet  coefficients 

for  each  of  the  remaining  hypercube  children  of  Xj.  We  omit  the  precise  details  but  note  that  this  prediction 
operates  according  to  (30),  with  ||h||  ~ 

We  have  proved  that  the  number  of  bits  required  to  encode  each  surflet  using  the  above  procedure  is  constant  with 
respect  to  the  scale  j.  It  then  follows  that  the  asymptotic  rate-distortion  performance  of  this  coder  obeys 

^2(/,7fl)~  . 

This  coder  achieves  the  optimal  asymptotic  rate-distortion  performance.  The  additional  information  encoded  at 
interior  nodes  provides  the  key  to  efficiently  encoding  the  leaf  nodes. 

The  encoding  procedure  as  described  above  assumes  a  quantized  Taylor  surflet  for  each  node.  Our  current 
work  strongly  indicates  that  a  similar  predictive  coding  scheme  can  be  developed  using  L2-best  surflets,  and  that 
this  scheme  will  achieve  the  same  optimal  performance.  The  development  of  such  a  technique,  and  proof  of  its 
performance,  will  be  discussed  at  the  conference. 

4.5  Conclusions  and  Future  Work 

The  surflet-based  compression  framework  that  we  describe  provides  a  sparse  representation  of  higher  dimensional 
functions  with  smooth  discontinuities.  We  present  a  tractable  method  to  encode  piecewise  smooth  polynomials  to 
approximate  such  functions.  Some  of  the  insights  that  we  gain,  namely,  quantization  of  higher-order  terms  with 
lesser  precision,  and  predictive  coding  to  decrease  bitrate,  can  be  used  to  solve  to  more  sophisticated  compression 
problems.  In  addition,  our  method  only  requires  knowledge  of  the  higher  dimensional  function,  not  the  smooth 
discontinuity. 
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